{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install langchain\n",
    "!pip install openai\n",
    "!pip install python-dotenv\n",
    "!pip install faiss-cpu"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "import os\n",
    "\n",
    "# Laden Sie die Umgebungsvariablen aus der .env-Datei\n",
    "load_dotenv()\n",
    "API_KEY = os.environ.get(\"API_KEY\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loaders  \n",
    "To use data with an LLM, documents must first be loaded into a vector database. \n",
    "The first step is to load them into memory via a loader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import DirectoryLoader, TextLoader\n",
    "\n",
    "loader = DirectoryLoader(\n",
    "    \"./FAQ\", glob=\"**/*.txt\", loader_cls=TextLoader, show_progress=True\n",
    ")\n",
    "docs = loader.load()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text splitter\n",
    "Texts are not loaded 1:1 into the database, but in pieces, so called \"chunks\". You can define the chunk size and the overlap between the chunks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=500,\n",
    "    chunk_overlap=100,\n",
    ")\n",
    "\n",
    "documents = text_splitter.split_documents(docs)\n",
    "documents[0]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Embeddings\n",
    "Texts are not stored as text in the database, but as vector representations.\n",
    "Embeddings are a type of word representation that represents the semantic meaning of words in a vector space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Vectors into VectorDB (FAISS)\n",
    "As created by OpenAIEmbeddings vectors can now be stored in the database. The DB can be stored as .pkl file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.vectorstores.faiss import FAISS\n",
    "import pickle\n",
    "\n",
    "vectorstore = FAISS.from_documents(documents, embeddings)\n",
    "\n",
    "with open(\"vectorstore.pkl\", \"wb\") as f:\n",
    "    pickle.dump(vectorstore, f)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the database\n",
    "Before using the database, it must of course be loaded again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"vectorstore.pkl\", \"rb\") as f:\n",
    "    vectorstore = pickle.load(f)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prompts\n",
    "With an LLM you have the possibility to give it an identity before a conversation or to define how question and answer should look like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "prompt_template = \"\"\"You are a helpful assistant for our restaurant.\n",
    "\n",
    "{context}\n",
    "\n",
    "Question: {question}\n",
    "Answer here:\"\"\"\n",
    "PROMPT = PromptTemplate(\n",
    "    template=prompt_template, input_variables=[\"context\", \"question\"]\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chains\n",
    "With chain classes you can easily influence the behavior of the LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain.chains import RetrievalQA\n",
    "\n",
    "chain_type_kwargs = {\"prompt\": PROMPT}\n",
    "\n",
    "llm = OpenAI(openai_api_key=API_KEY)\n",
    "qa = RetrievalQA.from_chain_type(\n",
    "    llm=llm,\n",
    "    chain_type=\"stuff\",\n",
    "    retriever=vectorstore.as_retriever(),\n",
    "    chain_type_kwargs=chain_type_kwargs,\n",
    ")\n",
    "\n",
    "query = \"When does the restaurant open?\"\n",
    "qa.run(query)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Memory\n",
    "In the example just shown, each request stands alone. A great strength of an LLM, however, is that it can take the entire chat history into account when responding. For this, however, a chat history must be built up from the different questions and answers. With different memory classes this is very easy in Langchain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.memory import ConversationBufferMemory\n",
    "\n",
    "memory = ConversationBufferMemory(\n",
    "    memory_key=\"chat_history\", return_messages=True, output_key=\"answer\"\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Memory in Chains\n",
    "The memory class can now easily be used in a chain. This is recognizable, for example, by the fact that when one speaks of \"it\", the bot understands the rabbit in this context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains import ConversationalRetrievalChain\n",
    "\n",
    "qa = ConversationalRetrievalChain.from_llm(\n",
    "    llm=OpenAI(model_name=\"text-davinci-003\", temperature=0.7, openai_api_key=API_KEY),\n",
    "    memory=memory,\n",
    "    retriever=vectorstore.as_retriever(),\n",
    "    combine_docs_chain_kwargs={\"prompt\": PROMPT},\n",
    ")\n",
    "\n",
    "\n",
    "query = \"Do you offer vegan food?\"\n",
    "qa({\"question\": query})\n",
    "qa({\"question\": \"How much does it cost?\"})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
